AITopics | Loreto Department

Collaborating Authors

Loreto Department

Long-form factuality in large language models Jerry Wei 1 Chengrun Y ang 1 Xinying Song 1 Yifeng Lu

Neural Information Processing SystemsNov-19-2025, 21:47:41 GMT

To benchmark a model's long-form factuality in open domains, we first use GPT -4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE).

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > South Australia > Adelaide (0.14)
(48 more...)

Genre:

Research Report > Experimental Study (1.00)
Personal > Honors (0.67)

Industry:

Media > Television (1.00)
Media > Music (1.00)
Media > Film (1.00)
(23 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

937ae0e83eb08d2cb8627fe1def8c751-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 09:59:25 GMT

factuality, individual fact, language model, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > South Australia > Adelaide (0.14)
(50 more...)

Genre:

Research Report > Experimental Study (1.00)
Personal > Honors (0.67)
Overview (0.67)

Industry:

Media > Television (1.00)
Media > Music (1.00)
Media > Film (1.00)
(22 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(5 more...)

Add feedback

Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask?

Neural Information Processing SystemsOct-9-2025, 12:38:55 GMT

PaI methods manage to find trainable subnetworks that outperform random pruning, their performance in terms of both accuracy and computational reduction is far from satisfactory compared to post-training pruning and the understanding of PaI is missing.

machine learning, natural language, subnetwork, (20 more...)

Neural Information Processing Systems

Country:

South America > Peru > Loreto Department (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Communications > Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Cardioformer: Advancing AI in ECG Analysis with Multi-Granularity Patching and ResNet

Mobin, Md Kamrujjaman, Islam, Md Saiful, Barid, Sadik Al, Masum, Md

arXiv.org Artificial IntelligenceMay-12-2025

Electrocardiogram (ECG) classification is crucial for automated cardiac disease diagnosis, yet existing methods often struggle to capture local morphological details and long-range temporal dependencies simultaneously. To address these challenges, we propose Cardioformer, a novel multi-granularity hybrid model that integrates cross-channel patching, hierarchical residual learning, and a two-stage self-attention mechanism. Cardioformer first encodes multi-scale token embeddings to capture fine-grained local features and global contextual information and then selectively fuses these representations through intra- and inter-granularity self-attention. Extensive evaluations on three benchmark ECG datasets under subject-independent settings demonstrate that model consistently outperforms four state-of-the-art baselines. Our Cardioformer model achieves the AUROC of 96.34$\pm$0.11, 89.99$\pm$0.12, and 95.59$\pm$1.66 in MIMIC-IV, PTB-XL and PTB dataset respectively outperforming PatchTST, Reformer, Transformer, and Medformer models. It also demonstrates strong cross-dataset generalization, achieving 49.18% AUROC on PTB and 68.41% on PTB-XL when trained on MIMIC-IV. These findings underscore the potential of Cardioformer to advance automated ECG analysis, paving the way for more accurate and robust cardiovascular disease diagnosis. We release the source code at https://github.com/KMobin555/Cardioformer.

artificial intelligence, cardioformer, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.05538

Country:

South America > Peru > Loreto Department (0.04)
North America > United States (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

An Efficient GPU-based Implementation for Noise Robust Sound Source Localization

Lin, Zirui, Takigahira, Masayuki, Terakado, Naoya, Gulzar, Haris, Busto, Monikka Roslianna, Eda, Takeharu, Itoyama, Katsutoshi, Nakadai, Kazuhiro, Amano, Hideharu

arXiv.org Artificial IntelligenceMay-9-2025

Dept. of Information and Computer Science, Keio University, Kanagawa, Japan Email: hunga@am.ics.keio.ac.jp Abstract --Robot audition, encompassing Sound Source Localization (SSL), Sound Source Separation (SSS), and Automatic Speech Recognition (ASR), enables robots and smart devices to acquire auditory capabilities similar to human hearing. Despite their wide applicability, processing multi-channel audio signals from microphone arrays in SSL involves computationally intensive matrix operations, which can hinder efficient deployment on Central Processing Units (CPUs), particularly in embedded systems with limited CPU resources. This paper introduces a GPU-based implementation of SSL for robot audition, utilizing the Generalized Singular V alue Decomposition-based Multiple Signal Classification (GSVD-MUSIC), a noise-robust algorithm, within the HARK platform, an open-source software suite. For a 60-channel microphone array, the proposed implementation achieves significant performance improvements. On the Jet-son AGX Orin, an embedded device powered by an NVIDIA GPU and ARM Cortex -A78AE v8.2 64-bit CPUs, we observe speedups of 5648.7 for GSVD calculations and 10.7 for the SSL module, while speedups of 4245.1 for GSVD calculation and 17.3 for the entire SSL module on a server configured with an NVIDIA A100 GPU and AMD EPYC 7352 CPUs, making real-time processing feasible for large-scale microphone arrays and providing ample capacity for real-time processing of potential subsequent machine learning or deep leraning tasks. I NTRODUCTION Audition is a critical aspect of human inter-individual communication [1].

artificial intelligence, implementation, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2504.03373

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.24)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(7 more...)

Genre: Research Report (0.40)

Industry: Information Technology (0.87)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)

Add feedback

Disease Outbreak Detection and Forecasting: A Review of Methods and Data Sources

Babanejaddehaki, Ghazaleh, An, Aijun, Papagelis, Manos

arXiv.org Artificial IntelligenceOct-21-2024

Infectious diseases occur when pathogens from other individuals or animals infect a person, resulting in harm to both individuals and society as a whole. The outbreak of such diseases can pose a significant threat to human health. However, early detection and tracking of these outbreaks have the potential to reduce the mortality impact. To address these threats, public health authorities have endeavored to establish comprehensive mechanisms for collecting disease data. Many countries have implemented infectious disease surveillance systems, with the detection of epidemics being a primary objective. The clinical healthcare system, local/state health agencies, federal agencies, academic/professional groups, and collaborating governmental entities all play pivotal roles within this system. Moreover, nowadays, search engines and social media platforms can serve as valuable tools for monitoring disease trends. The Internet and social media have become significant platforms where users share information about their preferences and relationships. This real-time information can be harnessed to gauge the influence of ideas and societal opinions, making it highly useful across various domains and research areas, such as marketing campaigns, financial predictions, and public health, among others. This article provides a review of the existing standard methods developed by researchers for detecting outbreaks using time series data. These methods leverage various data sources, including conventional data sources and social media data or Internet data sources. The review particularly concentrates on works published within the timeframe of 2015 to 2022.

bioinformatics, machine learning, real time system, (19 more...)

arXiv.org Artificial Intelligence

2410.1729

Country:

Europe > United Kingdom (0.14)
Asia > Japan (0.14)
Asia > South Korea (0.14)
(42 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Internal Medicine (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(3 more...)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(9 more...)

Add feedback

Real-Time Localization and Bimodal Point Pattern Analysis of Palms Using UAV Imagery

Cui, Kangning, Tang, Wei, Zhu, Rongkun, Wang, Manqi, Larsen, Gregory D., Pauca, Victor P., Alqahtani, Sarra, Yang, Fan, Segurado, David, Fine, Paul, Karubian, Jordan, Chan, Raymond H., Plemmons, Robert J., Morel, Jean-Michel, Silman, Miles R.

arXiv.org Artificial IntelligenceOct-14-2024

Understanding the spatial distribution of palms within tropical forests is essential for effective ecological monitoring, conservation strategies, and the sustainable integration of natural forest products into local and global supply chains. However, the analysis of remotely sensed data in these environments faces significant challenges, such as overlapping palm and tree crowns, uneven shading across the canopy surface, and the heterogeneous nature of the forest landscapes, which often affect the performance of palm detection and segmentation algorithms. To overcome these issues, we introduce PalmDSNet, a deep learning framework for real-time detection, segmentation, and counting of canopy palms. Additionally, we employ a bimodal reproduction algorithm that simulates palm spatial propagation to further enhance the understanding of these point patterns using PalmDSNet's results. We used UAV-captured imagery to create orthomosaics from 21 sites across western Ecuadorian tropical forests, covering a gradient from the everwet Choc\'o forests near Colombia to the drier forests of southwestern Ecuador. Expert annotations were used to create a comprehensive dataset, including 7,356 bounding boxes on image patches and 7,603 palm centers across five orthomosaics, encompassing a total area of 449 hectares. By combining PalmDSNet with the bimodal reproduction algorithm, which optimizes parameters for both local and global spatial variability, we effectively simulate the spatial distribution of palms in diverse and dense tropical environments, validating its utility for advanced applications in tropical forest monitoring and remote sensing analysis.

artificial intelligence, detection, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.11124

Country:

South America > Ecuador (0.24)
South America > Colombia (0.24)
North America > United States > California > Alameda County > Berkeley (0.14)
(9 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Information Technology (0.67)
Education (0.46)
Materials (0.46)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.35)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models

Demircan, Can, Saanum, Tankred, Jagadish, Akshay K., Binz, Marcel, Schulz, Eric

arXiv.org Artificial IntelligenceOct-2-2024

In-context learning, the ability to adapt based on a few examples in the input prompt, is a ubiquitous feature of large language models (LLMs). However, as LLMs' in-context learning abilities continue to improve, understanding this phenomenon mechanistically becomes increasingly important. In particular, it is not well-understood how LLMs learn to solve specific classes of problems, such as reinforcement learning (RL) problems, in-context. Through three different tasks, we first show that Llama $3$ $70$B can solve simple RL problems in-context. We then analyze the residual stream of Llama using Sparse Autoencoders (SAEs) and find representations that closely match temporal difference (TD) errors. Notably, these representations emerge despite the model only being trained to predict the next token. We verify that these representations are indeed causally involved in the computation of TD errors and $Q$-values by performing carefully designed interventions on them. Taken together, our work establishes a methodology for studying and manipulating in-context learning with SAEs, paving the way for a more mechanistic understanding.

large language model, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2410.0128

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
South America > Peru > Loreto Department (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(4 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

RewardBench: Evaluating Reward Models for Language Modeling

Lambert, Nathan, Pyatkin, Valentina, Morrison, Jacob, Miranda, LJ, Lin, Bill Yuchen, Chandu, Khyathi, Dziri, Nouha, Kumar, Sachin, Zick, Tom, Choi, Yejin, Smith, Noah A., Hajishirzi, Hannaneh

arXiv.org Artificial IntelligenceJun-8-2024

Reward models (RMs) are at the crux of successfully using RLHF to align pretrained models to human preferences, yet there has been relatively little study that focuses on evaluation of those models. Evaluating reward models presents an opportunity to understand the opaque technologies used for alignment of language models and which values are embedded in them. Resources for reward model training and understanding are sparse in the nascent open-source community around them. To enhance scientific understanding of reward models, we present RewardBench, a benchmark dataset and code-base for evaluation. The RewardBench dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety, to benchmark how reward models perform on challenging, structured and out-of-distribution queries. We create specific comparison datasets for RMs that have subtle, but verifiable reasons (e.g. bugs, incorrect facts) why one answer should be preferred to another. On the RewardBench leaderboard, we evaluate reward models trained with a variety of methods, such as the direct MLE training of classifiers and the implicit reward modeling of Direct Preference Optimization (DPO). We present many findings on propensity for refusals, reasoning limitations, and instruction following shortcomings of various reward models towards a better understanding of the RLHF process.

arxiv preprint arxiv, contextualai archangel, reward model, (14 more...)

arXiv.org Artificial Intelligence

2403.13787

Country:

South America > Peru > Loreto Department > Maynas Province > Iquitos (0.04)
South America > Colombia (0.04)
South America > Brazil (0.04)
(3 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry:

Health & Medicine > Consumer Health (1.00)
Education (0.67)
Health & Medicine > Therapeutic Area > Gastroenterology (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Long-form factuality in large language models

Wei, Jerry, Yang, Chengrun, Song, Xinying, Lu, Yifeng, Hu, Nathan, Huang, Jie, Tran, Dustin, Peng, Daiyi, Liu, Ruibo, Huang, Da, Du, Cosmo, Le, Quoc V.

arXiv.org Artificial IntelligenceApr-3-2024

Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE). SAFE utilizes an LLM to break down a long-form response into a set of individual facts and to evaluate the accuracy of each fact using a multi-step reasoning process comprising sending search queries to Google Search and determining whether a fact is supported by the search results. Furthermore, we propose extending F1 score as an aggregated metric for long-form factuality. To do so, we balance the percentage of supported facts in a response (precision) with the percentage of provided facts relative to a hyperparameter representing a user's preferred response length (recall). Empirically, we demonstrate that LLM agents can outperform crowdsourced human annotators - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time. At the same time, SAFE is more than 20 times cheaper than human annotators. We also benchmark thirteen language models on LongFact across four model families (Gemini, GPT, Claude, and PaLM-2), finding that larger language models generally achieve better long-form factuality. LongFact, SAFE, and all experimental code are available at https://github.com/google-deepmind/long-form-factuality.

factuality, individual fact, language model, (14 more...)

arXiv.org Artificial Intelligence

2403.18802

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > South Australia > Adelaide (0.14)
(50 more...)

Genre:

Research Report (1.00)
Personal > Honors (0.67)
Personal > Interview (0.47)
Personal > Obituary (0.45)

Industry:

Media > Television (1.00)
Media > Music (1.00)
Media > Film (1.00)
(21 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback